MCP tool: compare_to_history

Last verified2026-05-20

What this tool does

The base-rate matcher is the platform’s flagship analytical surface — “when markets looked like this in the past, what happened next.” It runs over the full daily_signals history, scores each historical row against today’s signal vector using weighted Euclidean distance (or legacy hard-bin equality), and returns the forward 1d / 3d / 5d SPY return distribution across the top-K analogues.

compare_to_history is the MCP exposure of that matcher. An AI agent calls it with either today’s live signal dict or a hypothetical scenario (“what if VIX were 35 and DIX were 0.38?”) and reads back the analogue forward-return stats. No dashboard scrape, no manual API juggling — one tool, one round-trip, the same numbers the dashboard surfaces.

The tool is a thin wrapper. It does not own matcher logic. Every numeric decision — distance metric, dim trust, recency decay, bootstrap CI — lives in app/signals/base_rates.py and reaches the tool through compute_base_rates(). When the matcher evolves (Stream C2 hybrid tags + Euclidean), the tool inherits the change without a client edit.

C2 mode dispatch (2026-05)

MATCHER_VERSION bumped to 2026.05.8-hybrid-13d-recency730 when Stream C2 landed. Three modes joined the catalog alongside the legacy hard / soft:

euclidean — alias for soft. Same behaviour; the new name makes “what does this actually do” clearer to a reasoning model browsing the schema.
tag — pre-filter on a five-component coarse subset (gex, energy_regime, dix, vix_regime, zero_dte_pcr). Returns the matching set ranked by recency. No Euclidean leg.
hybrid — tag pre-filter then Euclidean rank within the filtered set. Option C per DOCTRINE §6 Q1. This is the directional default — current default stays soft per DOCTRINE P8 (additive, not destructive — the calibration loop decides if/when hybrid flips to default).

When the tag filter under-shoots _HYBRID_MIN_ANALOGUES (30 analogues) the matcher falls back to Euclidean over the unfiltered history. The response carries hybrid_fallback=true plus a hybrid_fallback_reason string so consumers see the degradation. Per DOCTRINE P0 — no silent fallback.

Three design decisions

These are locked by D9 in IDEAS.md. Each was a real fork; the rejected options are recorded for the same reason DOCTRINE.md records D1–D16 — so a future session understands why we chose what we chose.

1. Argument shape — full signal dict

Options considered.

Full signal dict. Caller passes {vix_close: 22.4, dix: 0.46, regime: "CAUTIOUS", ...} — any subset of daily_signals columns. Powerful: the agent can probe arbitrary combinations including dims the dashboard doesn’t expose. Risk: the agent can introduce typos or unknown dimensions and the matcher silently ignores them, masking drift.
Constrained signal-state pairs. Tool exposes a narrow enum of matchable dims (vix_close, dix, hy_oas, …) and rejects anything else. Safer; forces the agent to know what’s matchable. Costs the open-ended “what if” exploration that makes the tool useful to a reasoning model.

Decision — full dict, validated against signal_definitions.json::api_field_path. Unknown keys are accepted but logged at WARNING and ignored by the matcher. The matcher already handles missing dims gracefully (each contributes nothing to the distance sum); silent-drop is the existing behavior. The new bit is the WARNING log so unknown keys are loud in observability without breaking the call.

Rationale. Keeps the tool open to hypothetical scenarios. Loud-fails silent dimension drift (a renamed column showing up as “unknown key” in logs is exactly the signal we want). Matches DOCTRINE P0 — no silent degradation. The validator is one pass over the registry; cheap.

2. Return shape — summary by default, analogue dates on request

Options considered.

Summary only. Return the statistical headline: mean_forward_return, confidence_interval, sample_size, regime_breakdown. Small response. Good enough for an agent that wants the read and moves on.
Summary plus analogue dates. Every call also returns the list of historical dates that matched, with per-analogue weight and forward returns. Lets the agent drill in (“show me what happened on the three closest historical analogues”). Larger payload — 100 analogues × 4 fields adds real bytes.

Decision — both, gated by include_analogue_dates: bool = false. Default false to keep responses small. When true, the response appends analogue_dates: [{date, distance, weight, forward_1d, forward_3d, forward_5d}, ...] — one entry per analogue, sorted by distance ascending. The existing HTTP route already supports this via a query flag; the MCP tool surfaces the same pattern.

Rationale. Tool agents and human-facing agents have different needs. A scheduling agent calling compare_to_history every 15 minutes wants the headline cheap; a research agent building a thesis wants the dates. Default optimizes for the common case; the flag opens the deeper read without two tools to maintain.

3. Mode selection — respect the live default, accept overrides

Options considered.

Force one mode. Always run hybrid (or always soft, or always hard). Predictable but freezes the tool against a single matcher version — when C2 ships the hybrid mode, every MCP client has to learn the change.
Respect the live default. Read whatever mode the HTTP route defaults to today. When the matcher upgrades and MATCHER_VERSION bumps, the tool picks up the new default automatically.

Decision — respect the live default, accept the same overrides as the HTTP route. Today that means mode="soft", top_k=100, recency_half_life_days=730, backfill_weight=1.0, with dim_trust as an optional override dict. When Stream C2 lands and the default flips to hybrid, the tool picks it up with no client change required.

Rationale. Future-proof. Clients never have to reason about which matcher version they’re calling — they get whatever the platform’s current best default is. The override knobs are there for agents that want a specific comparison (e.g. mode="hard" to compare a soft-mode prediction against the legacy bin-equality matcher).

Argument schema

The tool signature mirrors the HTTP route surface plus the signals_dict argument. JSON schema:

{
  "name": "compare_to_history",
  "description": "Match a live or hypothetical signal vector against daily_signals history. Returns forward-return distributions (1d / 3d / 5d SPY) for similar historical setups.",
  "input_schema": {
    "type": "object",
    "properties": {
      "signals_dict": {
        "type": "object",
        "description": "Signal vector to match against history. Keys must be daily_signals column names (validated against signal_definitions.json). Unknown keys are accepted but logged at WARNING and ignored by the matcher. Omit to use the latest live row from daily_signals.",
        "additionalProperties": true
      },
      "horizon": {
        "type": "integer",
        "enum": [1, 3, 5],
        "description": "Which forward-return horizon to highlight. Optional; the response always carries all three (forward_1d, forward_3d, forward_5d). When set, the response also includes a top-level headline reading for the requested horizon."
      },
      "mode": {
        "type": "string",
        "enum": ["soft", "hard", "euclidean", "tag", "hybrid"],
        "description": "Matcher mode. Omit to use the live default (currently 'soft'). 'soft' / 'euclidean' are aliases — weighted Euclidean distance over the full signal vector. 'hard' = legacy bin-equality. 'tag' (C2, 2026-05) = coarse-tag pre-filter only, ranked by recency. 'hybrid' (C2) = tag pre-filter then Euclidean rank within the filtered set; option C per DOCTRINE §6 Q1. When the tag filter under-shoots _HYBRID_MIN_ANALOGUES the matcher falls back to Euclidean and surfaces hybrid_fallback=true on the response (loud-fail per DOCTRINE P0)."
      },
      "top_k": {
        "type": "integer",
        "minimum": 1,
        "maximum": 500,
        "description": "Euclidean / tag analogue-set cap. Omit to use the live default (currently 100)."
      },
      "recency_half_life_days": {
        "type": "integer",
        "minimum": 0,
        "description": "Exponential decay of analogue weight by trade-date age. 0 disables decay. Omit to use the live default (currently 730 = 2-year half-life)."
      },
      "backfill_weight": {
        "type": "number",
        "minimum": 0.0,
        "maximum": 1.0,
        "description": "Weight applied to rows tagged report_filename='historical-backfill'. Live rows always weight 1.0. Omit to use the live default (currently 1.0)."
      },
      "dim_trust": {
        "type": "object",
        "description": "Per-dim trust multiplier override (e.g. {'pct_above_50sma': 1.0, 'hy_oas': 1.0}). Pass {} to disable the default proxy-trust down-weighting entirely. Omit to use the live _DIM_TRUST_DEFAULT.",
        "additionalProperties": { "type": "number" }
      },
      "include_analogue_dates": {
        "type": "boolean",
        "default": false,
        "description": "When true, the response appends analogue_dates: [{date, distance, weight, forward_1d, forward_3d, forward_5d}, ...]. Adds payload size; off by default."
      }
    }
  }
}

Validation rules

Unknown signals_dict keys. Every key is checked against signal_definitions.json::api_field_path and the daily_signals column set. Unknown keys log a single WARNING event per call (logger.warning("compare_to_history: unknown signal key %r ignored", key)) and drop out of the call to compute_base_rates(). The matcher already tolerates missing dims; this is the existing graceful path.
Type coercion. Numeric dims that arrive as strings ("22.4") are coerced once; failure logs a WARNING and drops the key. Categorical dims are taken verbatim.
Missing signals_dict entirely. When omitted, the tool reads SELECT * FROM daily_signals ORDER BY trade_date DESC LIMIT 1 — the same path the HTTP route uses. This is the “match today against history” common case.
Override-knob bounds. top_k, recency_half_life_days, backfill_weight use the same bounds the HTTP route declares. Out-of-bounds values raise a structured error before reaching the matcher.

Return schema

The base response is the compute_base_rates() payload verbatim, JSON-serialized. Example with default args:

{
  "matcher_version": "(current MATCHER_VERSION at call time — 2026.05.8-era tag shown when this article was written)",
  "mode": "soft",
  "trade_date": "2026-05-20",
  "sample_size": 100,
  "live_count": 67,
  "backfill_count": 33,
  "backfill_weight": 1.0,
  "recency_half_life_days": 730,
  "total_history_days": 1319,
  "match_dimensions": 16,
  "matched_bins": {
    "regime": "CAUTIOUS",
    "vix_close": "VIX_MED",
    "dix": "DIX_NEUTRAL",
    "hy_oas": "CREDIT_NORMAL"
  },
  "top_k": 100,
  "distance_min": 0.842,
  "distance_max": 3.117,
  "distance_mean": 1.984,
  "forward_1d": {
    "count": 98, "effective_count": 76.4,
    "mean": 0.12, "median": 0.18, "stdev": 0.94,
    "positive_pct": 56.1,
    "p5": -1.61, "p25": -0.34, "p75": 0.71, "p95": 1.42,
    "worst": -2.87, "best": 2.41,
    "ci_low_95": -0.07, "ci_high_95": 0.31
  },
  "forward_3d": { /* same shape */ },
  "forward_5d": { /* same shape */ },
  "regime_distribution": {
    "CAUTIOUS": 54, "TRANSITIONAL": 28, "RISK-ON": 18
  },
  "regime_breakdown": {
    "CAUTIOUS": { "count": 54, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} },
    "TRANSITIONAL": { "count": 28, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} },
    "RISK-ON": { "count": 18, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} }
  }
}

With `include_analogue_dates=true`

Appends:

{
  "analogue_dates": [
    { "date": "2024-09-12", "distance": 0.842, "weight": 0.91,
      "forward_1d": 0.34, "forward_3d": 0.82, "forward_5d": 1.14 },
    { "date": "2025-02-04", "distance": 0.917, "weight": 0.96,
      "forward_1d": -0.18, "forward_3d": 0.41, "forward_5d": 0.93 },
    /* ... up to top_k entries, sorted by distance ascending ... */
  ]
}

Each entry carries the matched date, the raw distance the matcher computed, the final analogue weight (after backfill_weight × recency decay), and the forward returns the matcher used in the headline aggregates. The agent can pick the three closest analogues, drill into their reports, and read the actual narrative for each.

With `horizon` set

Adds a top-level shortcut block so the agent doesn’t have to pick the right forward_Nd key:

{
  "horizon": 3,
  "headline": {
    "mean": 0.34, "median": 0.41,
    "positive_pct": 61.2,
    "ci_low_95": 0.08, "ci_high_95": 0.59,
    "sample_size": 95
  }
}

Mode selection

The HTTP route’s defaults are the tool’s defaults. Always.

mode = "soft"
top_k = 100
recency_half_life_days = 730
backfill_weight = 1.0
dim_trust = _DIM_TRUST_DEFAULT

The mode, recency half-life, and top_k are read from SERVED_MATCHER_CONFIG — the pinned served-config constant in app/signals/base_rates.py that the live HTTP route, the calibration capture, and the reseed script all consume. (They are NOT compute_base_rates()’s own bare defaults, which differ — the compute-level default is hard mode with a smaller pool; the served constant is the contract.) When the served config evolves — a mode flip, a top_k change, a new dim-trust entry — the MCP tool inherits the change in the next deploy. The client makes the call exactly the same way and gets the new behavior. No tool-side enum to update, no contract to renegotiate.

MATCHER_VERSION is emitted on every response — read response.matcher_version for the live value rather than trusting a version string printed in documentation (the constant lives in app/signals/base_rates.py; the hybrid matcher article decodes the current tag). The version tag is the agent’s hook for detecting matcher upgrades. An agent that pins a thesis against a specific matcher version can compare response.matcher_version to its expected value and re-run if the matcher has bumped.

Bump rules for MATCHER_VERSION are defined in app/signals/base_rates.py — bump on mode-default flip, default top_k / recency change, soft dim set change, bin-threshold change, distance-metric semantics change. Don’t bump on behavior-identical refactors. After a bump, python scripts/backfill_base_rate_predictions.py --force re-seeds the calibration substrate.

Example usage

Example A — live signals, default args

The most common call. Read today’s row, match against history, return the headline.

Request:

{ "tool": "compare_to_history", "arguments": {} }

Response (abbreviated):

{
  "matcher_version": "(current MATCHER_VERSION at call time — 2026.05.8-era tag shown when this article was written)",
  "mode": "soft",
  "trade_date": "2026-05-20",
  "sample_size": 100,
  "forward_1d": { "mean": 0.12, "median": 0.18, "positive_pct": 56.1, "ci_low_95": -0.07, "ci_high_95": 0.31, "count": 98 },
  "forward_3d": { "mean": 0.34, "median": 0.41, "positive_pct": 61.2, "ci_low_95": 0.08, "ci_high_95": 0.59, "count": 95 },
  "forward_5d": { "mean": 0.51, "median": 0.62, "positive_pct": 64.8, "ci_low_95": 0.21, "ci_high_95": 0.84, "count": 93 },
  "regime_distribution": { "CAUTIOUS": 54, "TRANSITIONAL": 28, "RISK-ON": 18 }
}

The agent reads: “100 analogues, modest positive lean across all three horizons, CI on 3d return excludes zero — moderate conviction long.” Cheap, fast, one call.

Example B — hypothetical scenario with analogues + trust override

A research agent wants to know what historically happens when VIX climbs to 32 and DIX collapses to 0.39 in a CAUTIOUS regime — and wants the matched dates so it can read the actual reports. It also doesn’t want the matcher down-weighting the breadth-proxy dim, so it overrides dim_trust.

Request:

{
  "tool": "compare_to_history",
  "arguments": {
    "signals_dict": {
      "vix_close": 32.0,
      "dix": 0.39,
      "regime": "CAUTIOUS",
      "hy_oas": 4.1,
      "pct_above_50sma": 28,
      "gex": -2.4,
      "energy_regime": "STABLE"
    },
    "horizon": 5,
    "include_analogue_dates": true,
    "dim_trust": { "pct_above_50sma": 1.0, "hy_oas": 1.0 }
  }
}

Response (abbreviated):

{
  "matcher_version": "(current MATCHER_VERSION at call time — 2026.05.8-era tag shown when this article was written)",
  "mode": "soft",
  "sample_size": 42,
  "horizon": 5,
  "headline": {
    "mean": -0.84, "median": -0.61, "positive_pct": 38.1,
    "ci_low_95": -1.42, "ci_high_95": -0.27,
    "sample_size": 41
  },
  "forward_1d": { /* ... */ },
  "forward_3d": { /* ... */ },
  "forward_5d": { "mean": -0.84, "median": -0.61, "positive_pct": 38.1, "ci_low_95": -1.42, "ci_high_95": -0.27, "count": 41 },
  "regime_distribution": { "CAUTIOUS": 31, "RISK-OFF": 11 },
  "analogue_dates": [
    { "date": "2022-09-13", "distance": 0.612, "weight": 0.41, "forward_1d": -1.94, "forward_3d": -3.12, "forward_5d": -4.27 },
    { "date": "2023-03-09", "distance": 0.701, "weight": 0.57, "forward_1d": -0.32, "forward_3d": 0.18, "forward_5d": 1.42 },
    { "date": "2024-04-15", "distance": 0.844, "weight": 0.82, "forward_1d": -0.94, "forward_3d": -1.21, "forward_5d": -0.84 }
  ]
}

The agent reads: “42 analogues for that setup, 5d mean −0.84% with CI excluding zero on the negative side, mostly CAUTIOUS with some RISK-OFF spillover — bearish lean. Top three closest historical matches were 2022-09-13 (sharp drop), 2023-03-09 (recovery), 2024-04-15 (mild drop). I’ll pull those three reports next.” The override to dim_trust ensures the breadth and HY OAS dims contribute at full weight rather than the default proxy-discount.

Implementation notes

For the implementer:

Reuse compute_base_rates(). No duplicate matcher logic anywhere in the MCP package. The tool is a wrapper: validate args → load history → call compute_base_rates() → optionally append analogue_dates → return JSON-serialized. If you find yourself reimplementing distance math, stop and route through the existing function.
History pull mirrors the HTTP route. SELECT * FROM daily_signals ORDER BY trade_date ASC — the SELECT * is load-bearing (every dim in _SOFT_NUMERIC_DIMS must be available). See app/signals/routes.py:get_base_rates for the canonical pattern.
Argument validation runs before the DB hit. Build the unknown-key WARNING + the bounds-check failures into a small validator that runs in the tool entry point. A bounded top_k typo shouldn’t burn a SELECT *.
Registration mirrors mcp/mcp_server/main.py exactly. Use the existing @mcp.tool() decorator pattern with Annotated[type, "description"] for every argument. The tool body delegates to a function in mcp/mcp_server/tools/public.py (alongside market_pulse, analyze_signals, etc.). The public-tools module owns: cache key, tool_logger.log_call(), the await get(...) against the backend.
Backend call surface. The simplest implementation calls the existing HTTP route. To support signals_dict (which the route doesn’t currently accept), either (a) extend /api/v1/signals/base-rates to take a POSTed signals dict alongside its query params, or (b) add a sibling route POST /api/v1/signals/base-rates/compare that takes the dict in the body. Option (b) keeps the GET route’s cache-friendliness intact and is the recommended path; document the new route via the auto-generated KB pipeline.
Cache key. Include a stable hash of signals_dict + all overrides. The dashboard’s /signals/base-rates GET is cached for 5 min + 10 min SWR (per cache_control_middleware); the MCP tool should match — same data, same TTL.
Logging. Use tool_logger.log_call("compare_to_history", args, latency_ms, status) so the call shows up in the same observability pipe as the existing tools. Unknown-key WARNINGs go through the module logger (logging.getLogger("mcp_server.tools.public")).

Testing requirements

Per P9 — every new behavior ships with tests in the same commit. The contract is non-trivial; tests must pin all three load-bearing surfaces.

End-to-end contract test against the live HTTP route. A pytest in mcp/tests/test_tools_public.py calls compare_to_history with no args and /api/v1/signals/base-rates with no args (against the same backend), then asserts the JSON responses are equal modulo the MCP-only fields. Pins the “tool is a wrapper, not a reimplementation” invariant. If anyone forks the matcher logic into the tool, this test fails.
Schema-validation test for unknown keys. Call the tool with signals_dict = {"vix_close": 22.4, "totally_made_up_key": 999}. Assert that (a) the call succeeds, (b) a WARNING was logged matching the unknown-key pattern, (c) the response is identical to the same call without the unknown key. Pins the “loud but graceful” contract from Decision 1.
dim_trust override round-trip. Call with dim_trust={"hy_oas": 1.0, "pct_above_50sma": 1.0} and a fixed signals_dict, then call again with dim_trust={} (disable trust entirely). Assert the distances reported in analogue_dates differ — proves the override actually reaches the matcher.
include_analogue_dates flag test. Call with include_analogue_dates=false (default) and assert analogue_dates is absent. Call with true and assert it’s present, sorted by distance ascending, and the entry count equals sample_size.
MATCHER_VERSION emission. Every response carries matcher_version. Pin via a tiny assertion in the contract test — if a future refactor drops the field, the test catches it.
KB article presence test. Add a row to scripts/verify-docs.sh (or the existing KB-presence test in tests/) confirming ui/src/content/kb/mcp/compare-to-history.mdx exists. Per P3, every non-trivial change includes its KB article in the same PR — the test enforces it stays.

What this tool does

C2 mode dispatch (2026-05)

Three design decisions

1. Argument shape — full signal dict

2. Return shape — summary by default, analogue dates on request

3. Mode selection — respect the live default, accept overrides

Argument schema

Validation rules

Return schema

With include_analogue_dates=true

With horizon set

Mode selection

Example usage

Example A — live signals, default args

Example B — hypothetical scenario with analogues + trust override

Implementation notes

Testing requirements

See also

With `include_analogue_dates=true`

With `horizon` set