KB / mcp
MCP tool: compare_to_history
Last verified
What this tool does
The base-rate matcher is the platform’s flagship analytical surface — “when markets looked like this in the past, what happened next.” It runs over the full daily_signals history, scores each historical row against today’s signal vector using weighted Euclidean distance (or legacy hard-bin equality), and returns the forward 1d / 3d / 5d SPY return distribution across the top-K analogues.
compare_to_history is the MCP exposure of that matcher. An AI agent calls it with either today’s live signal dict or a hypothetical scenario (“what if VIX were 35 and DIX were 0.38?”) and reads back the analogue forward-return stats. No dashboard scrape, no manual API juggling — one tool, one round-trip, the same numbers the dashboard surfaces.
The tool is a thin wrapper. It does not own matcher logic. Every numeric decision — distance metric, dim trust, recency decay, bootstrap CI — lives in app/signals/base_rates.py and reaches the tool through compute_base_rates(). When the matcher evolves (Stream C2 hybrid tags + Euclidean), the tool inherits the change without a client edit.
C2 mode dispatch (2026-05)
MATCHER_VERSION bumped to 2026.05.8-hybrid-13d-recency730 when Stream C2 landed. Three modes joined the catalog alongside the legacy hard / soft:
euclidean— alias forsoft. Same behaviour; the new name makes “what does this actually do” clearer to a reasoning model browsing the schema.tag— pre-filter on a five-component coarse subset (gex,energy_regime,dix,vix_regime,zero_dte_pcr). Returns the matching set ranked by recency. No Euclidean leg.hybrid— tag pre-filter then Euclidean rank within the filtered set. Option C per DOCTRINE §6 Q1. This is the directional default — current default stayssoftper DOCTRINE P8 (additive, not destructive — the calibration loop decides if/when hybrid flips to default).
When the tag filter under-shoots _HYBRID_MIN_ANALOGUES (30 analogues) the matcher falls back to Euclidean over the unfiltered history. The response carries hybrid_fallback=true plus a hybrid_fallback_reason string so consumers see the degradation. Per DOCTRINE P0 — no silent fallback.
Three design decisions
These are locked by D9 in IDEAS.md. Each was a real fork; the rejected options are recorded for the same reason DOCTRINE.md records D1–D16 — so a future session understands why we chose what we chose.
1. Argument shape — full signal dict
Options considered.
- Full signal dict. Caller passes
{vix_close: 22.4, dix: 0.46, regime: "CAUTIOUS", ...}— any subset ofdaily_signalscolumns. Powerful: the agent can probe arbitrary combinations including dims the dashboard doesn’t expose. Risk: the agent can introduce typos or unknown dimensions and the matcher silently ignores them, masking drift. - Constrained signal-state pairs. Tool exposes a narrow enum of matchable dims (
vix_close,dix,hy_oas, …) and rejects anything else. Safer; forces the agent to know what’s matchable. Costs the open-ended “what if” exploration that makes the tool useful to a reasoning model.
Decision — full dict, validated against signal_definitions.json::api_field_path. Unknown keys are accepted but logged at WARNING and ignored by the matcher. The matcher already handles missing dims gracefully (each contributes nothing to the distance sum); silent-drop is the existing behavior. The new bit is the WARNING log so unknown keys are loud in observability without breaking the call.
Rationale. Keeps the tool open to hypothetical scenarios. Loud-fails silent dimension drift (a renamed column showing up as “unknown key” in logs is exactly the signal we want). Matches DOCTRINE P0 — no silent degradation. The validator is one pass over the registry; cheap.
2. Return shape — summary by default, analogue dates on request
Options considered.
- Summary only. Return the statistical headline:
mean_forward_return,confidence_interval,sample_size,regime_breakdown. Small response. Good enough for an agent that wants the read and moves on. - Summary plus analogue dates. Every call also returns the list of historical dates that matched, with per-analogue weight and forward returns. Lets the agent drill in (“show me what happened on the three closest historical analogues”). Larger payload — 100 analogues × 4 fields adds real bytes.
Decision — both, gated by include_analogue_dates: bool = false. Default false to keep responses small. When true, the response appends analogue_dates: [{date, distance, weight, forward_1d, forward_3d, forward_5d}, ...] — one entry per analogue, sorted by distance ascending. The existing HTTP route already supports this via a query flag; the MCP tool surfaces the same pattern.
Rationale. Tool agents and human-facing agents have different needs. A scheduling agent calling compare_to_history every 15 minutes wants the headline cheap; a research agent building a thesis wants the dates. Default optimizes for the common case; the flag opens the deeper read without two tools to maintain.
3. Mode selection — respect the live default, accept overrides
Options considered.
- Force one mode. Always run hybrid (or always soft, or always hard). Predictable but freezes the tool against a single matcher version — when C2 ships the hybrid mode, every MCP client has to learn the change.
- Respect the live default. Read whatever mode the HTTP route defaults to today. When the matcher upgrades and
MATCHER_VERSIONbumps, the tool picks up the new default automatically.
Decision — respect the live default, accept the same overrides as the HTTP route. Today that means mode="soft", top_k=100, recency_half_life_days=730, backfill_weight=1.0, with dim_trust as an optional override dict. When Stream C2 lands and the default flips to hybrid, the tool picks it up with no client change required.
Rationale. Future-proof. Clients never have to reason about which matcher version they’re calling — they get whatever the platform’s current best default is. The override knobs are there for agents that want a specific comparison (e.g. mode="hard" to compare a soft-mode prediction against the legacy bin-equality matcher).
Argument schema
The tool signature mirrors the HTTP route surface plus the signals_dict argument. JSON schema:
{
"name": "compare_to_history",
"description": "Match a live or hypothetical signal vector against daily_signals history. Returns forward-return distributions (1d / 3d / 5d SPY) for similar historical setups.",
"input_schema": {
"type": "object",
"properties": {
"signals_dict": {
"type": "object",
"description": "Signal vector to match against history. Keys must be daily_signals column names (validated against signal_definitions.json). Unknown keys are accepted but logged at WARNING and ignored by the matcher. Omit to use the latest live row from daily_signals.",
"additionalProperties": true
},
"horizon": {
"type": "integer",
"enum": [1, 3, 5],
"description": "Which forward-return horizon to highlight. Optional; the response always carries all three (forward_1d, forward_3d, forward_5d). When set, the response also includes a top-level headline reading for the requested horizon."
},
"mode": {
"type": "string",
"enum": ["soft", "hard", "euclidean", "tag", "hybrid"],
"description": "Matcher mode. Omit to use the live default (currently 'soft'). 'soft' / 'euclidean' are aliases — weighted Euclidean distance over the full signal vector. 'hard' = legacy bin-equality. 'tag' (C2, 2026-05) = coarse-tag pre-filter only, ranked by recency. 'hybrid' (C2) = tag pre-filter then Euclidean rank within the filtered set; option C per DOCTRINE §6 Q1. When the tag filter under-shoots _HYBRID_MIN_ANALOGUES the matcher falls back to Euclidean and surfaces hybrid_fallback=true on the response (loud-fail per DOCTRINE P0)."
},
"top_k": {
"type": "integer",
"minimum": 1,
"maximum": 500,
"description": "Euclidean / tag analogue-set cap. Omit to use the live default (currently 100)."
},
"recency_half_life_days": {
"type": "integer",
"minimum": 0,
"description": "Exponential decay of analogue weight by trade-date age. 0 disables decay. Omit to use the live default (currently 730 = 2-year half-life)."
},
"backfill_weight": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Weight applied to rows tagged report_filename='historical-backfill'. Live rows always weight 1.0. Omit to use the live default (currently 1.0)."
},
"dim_trust": {
"type": "object",
"description": "Per-dim trust multiplier override (e.g. {'pct_above_50sma': 1.0, 'hy_oas': 1.0}). Pass {} to disable the default proxy-trust down-weighting entirely. Omit to use the live _DIM_TRUST_DEFAULT.",
"additionalProperties": { "type": "number" }
},
"include_analogue_dates": {
"type": "boolean",
"default": false,
"description": "When true, the response appends analogue_dates: [{date, distance, weight, forward_1d, forward_3d, forward_5d}, ...]. Adds payload size; off by default."
}
}
}
}
Validation rules
- Unknown
signals_dictkeys. Every key is checked againstsignal_definitions.json::api_field_pathand thedaily_signalscolumn set. Unknown keys log a singleWARNINGevent per call (logger.warning("compare_to_history: unknown signal key %r ignored", key)) and drop out of the call tocompute_base_rates(). The matcher already tolerates missing dims; this is the existing graceful path. - Type coercion. Numeric dims that arrive as strings (
"22.4") are coerced once; failure logs a WARNING and drops the key. Categorical dims are taken verbatim. - Missing
signals_dictentirely. When omitted, the tool readsSELECT * FROM daily_signals ORDER BY trade_date DESC LIMIT 1— the same path the HTTP route uses. This is the “match today against history” common case. - Override-knob bounds.
top_k,recency_half_life_days,backfill_weightuse the same bounds the HTTP route declares. Out-of-bounds values raise a structured error before reaching the matcher.
Return schema
The base response is the compute_base_rates() payload verbatim, JSON-serialized. Example with default args:
{
"matcher_version": "2026.05.8-hybrid-13d-recency730",
"mode": "soft",
"trade_date": "2026-05-20",
"sample_size": 100,
"live_count": 67,
"backfill_count": 33,
"backfill_weight": 1.0,
"recency_half_life_days": 730,
"total_history_days": 1319,
"match_dimensions": 16,
"matched_bins": {
"regime": "CAUTIOUS",
"vix_close": "VIX_MED",
"dix": "DIX_NEUTRAL",
"hy_oas": "CREDIT_NORMAL"
},
"top_k": 100,
"distance_min": 0.842,
"distance_max": 3.117,
"distance_mean": 1.984,
"forward_1d": {
"count": 98, "effective_count": 76.4,
"mean": 0.12, "median": 0.18, "stdev": 0.94,
"positive_pct": 56.1,
"p5": -1.61, "p25": -0.34, "p75": 0.71, "p95": 1.42,
"worst": -2.87, "best": 2.41,
"ci_low_95": -0.07, "ci_high_95": 0.31
},
"forward_3d": { /* same shape */ },
"forward_5d": { /* same shape */ },
"regime_distribution": {
"CAUTIOUS": 54, "TRANSITIONAL": 28, "RISK-ON": 18
},
"regime_breakdown": {
"CAUTIOUS": { "count": 54, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} },
"TRANSITIONAL": { "count": 28, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} },
"RISK-ON": { "count": 18, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} }
}
}
With include_analogue_dates=true
Appends:
{
"analogue_dates": [
{ "date": "2024-09-12", "distance": 0.842, "weight": 0.91,
"forward_1d": 0.34, "forward_3d": 0.82, "forward_5d": 1.14 },
{ "date": "2025-02-04", "distance": 0.917, "weight": 0.96,
"forward_1d": -0.18, "forward_3d": 0.41, "forward_5d": 0.93 },
/* ... up to top_k entries, sorted by distance ascending ... */
]
}
Each entry carries the matched date, the raw distance the matcher computed, the final analogue weight (after backfill_weight × recency decay), and the forward returns the matcher used in the headline aggregates. The agent can pick the three closest analogues, drill into their reports, and read the actual narrative for each.
With horizon set
Adds a top-level shortcut block so the agent doesn’t have to pick the right forward_Nd key:
{
"horizon": 3,
"headline": {
"mean": 0.34, "median": 0.41,
"positive_pct": 61.2,
"ci_low_95": 0.08, "ci_high_95": 0.59,
"sample_size": 95
}
}
Mode selection
The HTTP route’s defaults are the tool’s defaults. Always.
mode = "soft"
top_k = 100
recency_half_life_days = 730
backfill_weight = 1.0
dim_trust = _DIM_TRUST_DEFAULT
These are read from the same constants compute_base_rates() uses. When the matcher evolves — Stream C2 introduces hybrid mode, a future tweak bumps top_k to 150, the dim trust map gets a new entry — the MCP tool inherits the change in the next deploy. The client makes the call exactly the same way and gets the new behavior. No tool-side enum to update, no contract to renegotiate.
MATCHER_VERSION is emitted on every response (current value: "2026.05.8-hybrid-13d-recency730"). The version tag is the agent’s hook for detecting matcher upgrades. An agent that pins a thesis against a specific matcher version can compare response.matcher_version to its expected value and re-run if the matcher has bumped.
Bump rules for MATCHER_VERSION are defined in app/signals/base_rates.py — bump on mode-default flip, default top_k / recency change, soft dim set change, bin-threshold change, distance-metric semantics change. Don’t bump on behavior-identical refactors. After a bump, python scripts/backfill_base_rate_predictions.py --force re-seeds the calibration substrate.
Example usage
Example A — live signals, default args
The most common call. Read today’s row, match against history, return the headline.
Request:
{ "tool": "compare_to_history", "arguments": {} }
Response (abbreviated):
{
"matcher_version": "2026.05.8-hybrid-13d-recency730",
"mode": "soft",
"trade_date": "2026-05-20",
"sample_size": 100,
"forward_1d": { "mean": 0.12, "median": 0.18, "positive_pct": 56.1, "ci_low_95": -0.07, "ci_high_95": 0.31, "count": 98 },
"forward_3d": { "mean": 0.34, "median": 0.41, "positive_pct": 61.2, "ci_low_95": 0.08, "ci_high_95": 0.59, "count": 95 },
"forward_5d": { "mean": 0.51, "median": 0.62, "positive_pct": 64.8, "ci_low_95": 0.21, "ci_high_95": 0.84, "count": 93 },
"regime_distribution": { "CAUTIOUS": 54, "TRANSITIONAL": 28, "RISK-ON": 18 }
}
The agent reads: “100 analogues, modest positive lean across all three horizons, CI on 3d return excludes zero — moderate conviction long.” Cheap, fast, one call.
Example B — hypothetical scenario with analogues + trust override
A research agent wants to know what historically happens when VIX climbs to 32 and DIX collapses to 0.39 in a CAUTIOUS regime — and wants the matched dates so it can read the actual reports. It also doesn’t want the matcher down-weighting the breadth-proxy dim, so it overrides dim_trust.
Request:
{
"tool": "compare_to_history",
"arguments": {
"signals_dict": {
"vix_close": 32.0,
"dix": 0.39,
"regime": "CAUTIOUS",
"hy_oas": 4.1,
"pct_above_50sma": 28,
"gex": -2.4,
"energy_regime": "STABLE"
},
"horizon": 5,
"include_analogue_dates": true,
"dim_trust": { "pct_above_50sma": 1.0, "hy_oas": 1.0 }
}
}
Response (abbreviated):
{
"matcher_version": "2026.05.8-hybrid-13d-recency730",
"mode": "soft",
"sample_size": 42,
"horizon": 5,
"headline": {
"mean": -0.84, "median": -0.61, "positive_pct": 38.1,
"ci_low_95": -1.42, "ci_high_95": -0.27,
"sample_size": 41
},
"forward_1d": { /* ... */ },
"forward_3d": { /* ... */ },
"forward_5d": { "mean": -0.84, "median": -0.61, "positive_pct": 38.1, "ci_low_95": -1.42, "ci_high_95": -0.27, "count": 41 },
"regime_distribution": { "CAUTIOUS": 31, "RISK-OFF": 11 },
"analogue_dates": [
{ "date": "2022-09-13", "distance": 0.612, "weight": 0.41, "forward_1d": -1.94, "forward_3d": -3.12, "forward_5d": -4.27 },
{ "date": "2023-03-09", "distance": 0.701, "weight": 0.57, "forward_1d": -0.32, "forward_3d": 0.18, "forward_5d": 1.42 },
{ "date": "2024-04-15", "distance": 0.844, "weight": 0.82, "forward_1d": -0.94, "forward_3d": -1.21, "forward_5d": -0.84 }
]
}
The agent reads: “42 analogues for that setup, 5d mean −0.84% with CI excluding zero on the negative side, mostly CAUTIOUS with some RISK-OFF spillover — bearish lean. Top three closest historical matches were 2022-09-13 (sharp drop), 2023-03-09 (recovery), 2024-04-15 (mild drop). I’ll pull those three reports next.” The override to dim_trust ensures the breadth and HY OAS dims contribute at full weight rather than the default proxy-discount.
Implementation notes
For the implementer:
- Reuse
compute_base_rates(). No duplicate matcher logic anywhere in the MCP package. The tool is a wrapper: validate args → load history → callcompute_base_rates()→ optionally appendanalogue_dates→ return JSON-serialized. If you find yourself reimplementing distance math, stop and route through the existing function. - History pull mirrors the HTTP route.
SELECT * FROM daily_signals ORDER BY trade_date ASC— theSELECT *is load-bearing (every dim in_SOFT_NUMERIC_DIMSmust be available). Seeapp/signals/routes.py:get_base_ratesfor the canonical pattern. - Argument validation runs before the DB hit. Build the unknown-key WARNING + the bounds-check failures into a small validator that runs in the tool entry point. A bounded
top_ktypo shouldn’t burn aSELECT *. - Registration mirrors
mcp/mcp_server/main.pyexactly. Use the existing@mcp.tool()decorator pattern withAnnotated[type, "description"]for every argument. The tool body delegates to a function inmcp/mcp_server/tools/public.py(alongsidemarket_pulse,analyze_signals, etc.). The public-tools module owns: cache key,tool_logger.log_call(), theawait get(...)against the backend. - Backend call surface. The simplest implementation calls the existing HTTP route. To support
signals_dict(which the route doesn’t currently accept), either (a) extend/api/v1/signals/base-ratesto take a POSTed signals dict alongside its query params, or (b) add a sibling routePOST /api/v1/signals/base-rates/comparethat takes the dict in the body. Option (b) keeps the GET route’s cache-friendliness intact and is the recommended path; document the new route via the auto-generated KB pipeline. - Cache key. Include a stable hash of
signals_dict+ all overrides. The dashboard’s/signals/base-ratesGET is cached for 5 min + 10 min SWR (percache_control_middleware); the MCP tool should match — same data, same TTL. - Logging. Use
tool_logger.log_call("compare_to_history", args, latency_ms, status)so the call shows up in the same observability pipe as the existing tools. Unknown-key WARNINGs go through the module logger (logging.getLogger("mcp_server.tools.public")).
Testing requirements
Per P9 — every new behavior ships with tests in the same commit. The contract is non-trivial; tests must pin all three load-bearing surfaces.
- End-to-end contract test against the live HTTP route. A pytest in
mcp/tests/test_tools_public.pycallscompare_to_historywith no args and/api/v1/signals/base-rateswith no args (against the same backend), then asserts the JSON responses are equal modulo the MCP-only fields. Pins the “tool is a wrapper, not a reimplementation” invariant. If anyone forks the matcher logic into the tool, this test fails. - Schema-validation test for unknown keys. Call the tool with
signals_dict = {"vix_close": 22.4, "totally_made_up_key": 999}. Assert that (a) the call succeeds, (b) a WARNING was logged matching the unknown-key pattern, (c) the response is identical to the same call without the unknown key. Pins the “loud but graceful” contract from Decision 1. dim_trustoverride round-trip. Call withdim_trust={"hy_oas": 1.0, "pct_above_50sma": 1.0}and a fixedsignals_dict, then call again withdim_trust={}(disable trust entirely). Assert the distances reported inanalogue_datesdiffer — proves the override actually reaches the matcher.include_analogue_datesflag test. Call withinclude_analogue_dates=false(default) and assertanalogue_datesis absent. Call withtrueand assert it’s present, sorted by distance ascending, and the entry count equalssample_size.MATCHER_VERSIONemission. Every response carriesmatcher_version. Pin via a tiny assertion in the contract test — if a future refactor drops the field, the test catches it.- KB article presence test. Add a row to
scripts/verify-docs.sh(or the existing KB-presence test intests/) confirmingui/src/content/kb/mcp/compare-to-history.mdxexists. Per P3, every non-trivial change includes its KB article in the same PR — the test enforces it stays.
See also
- Historical Base Rate Engine — the matcher this tool wraps, with the full dim list and trust-multiplier explanation.
GET /api/v1/signals/base-rates— the HTTP route with the same defaults.GET /api/v1/signals/base-rates/calibration— realized-vs-forecast accuracy curve for the matcher.- MCP server overview — connection setup, the other seven tools (
market_pulse,analyze_signals,get_quote,option_chain,read_report,portfolio_status,plan_trade,execute_trade). - Implementation source:
mcp/mcp_server/tools/public.py(tool body),mcp/mcp_server/main.py(registration),app/signals/base_rates.py(matcher).