Skip to content

KB / mcp

MCP tool: compare_to_history

Last verified

What this tool does

The base-rate matcher is the platform’s flagship analytical surface — “when markets looked like this in the past, what happened next.” It runs over the full daily_signals history, scores each historical row against today’s signal vector using weighted Euclidean distance (or legacy hard-bin equality), and returns the forward 1d / 3d / 5d SPY return distribution across the top-K analogues.

compare_to_history is the MCP exposure of that matcher. An AI agent calls it with either today’s live signal dict or a hypothetical scenario (“what if VIX were 35 and DIX were 0.38?”) and reads back the analogue forward-return stats. No dashboard scrape, no manual API juggling — one tool, one round-trip, the same numbers the dashboard surfaces.

The tool is a thin wrapper. It does not own matcher logic. Every numeric decision — distance metric, dim trust, recency decay, bootstrap CI — lives in app/signals/base_rates.py and reaches the tool through compute_base_rates(). When the matcher evolves (Stream C2 hybrid tags + Euclidean), the tool inherits the change without a client edit.

C2 mode dispatch (2026-05)

MATCHER_VERSION bumped to 2026.05.8-hybrid-13d-recency730 when Stream C2 landed. Three modes joined the catalog alongside the legacy hard / soft:

When the tag filter under-shoots _HYBRID_MIN_ANALOGUES (30 analogues) the matcher falls back to Euclidean over the unfiltered history. The response carries hybrid_fallback=true plus a hybrid_fallback_reason string so consumers see the degradation. Per DOCTRINE P0 — no silent fallback.

Three design decisions

These are locked by D9 in IDEAS.md. Each was a real fork; the rejected options are recorded for the same reason DOCTRINE.md records D1–D16 — so a future session understands why we chose what we chose.

1. Argument shape — full signal dict

Options considered.

Decision — full dict, validated against signal_definitions.json::api_field_path. Unknown keys are accepted but logged at WARNING and ignored by the matcher. The matcher already handles missing dims gracefully (each contributes nothing to the distance sum); silent-drop is the existing behavior. The new bit is the WARNING log so unknown keys are loud in observability without breaking the call.

Rationale. Keeps the tool open to hypothetical scenarios. Loud-fails silent dimension drift (a renamed column showing up as “unknown key” in logs is exactly the signal we want). Matches DOCTRINE P0 — no silent degradation. The validator is one pass over the registry; cheap.

2. Return shape — summary by default, analogue dates on request

Options considered.

Decision — both, gated by include_analogue_dates: bool = false. Default false to keep responses small. When true, the response appends analogue_dates: [{date, distance, weight, forward_1d, forward_3d, forward_5d}, ...] — one entry per analogue, sorted by distance ascending. The existing HTTP route already supports this via a query flag; the MCP tool surfaces the same pattern.

Rationale. Tool agents and human-facing agents have different needs. A scheduling agent calling compare_to_history every 15 minutes wants the headline cheap; a research agent building a thesis wants the dates. Default optimizes for the common case; the flag opens the deeper read without two tools to maintain.

3. Mode selection — respect the live default, accept overrides

Options considered.

Decision — respect the live default, accept the same overrides as the HTTP route. Today that means mode="soft", top_k=100, recency_half_life_days=730, backfill_weight=1.0, with dim_trust as an optional override dict. When Stream C2 lands and the default flips to hybrid, the tool picks it up with no client change required.

Rationale. Future-proof. Clients never have to reason about which matcher version they’re calling — they get whatever the platform’s current best default is. The override knobs are there for agents that want a specific comparison (e.g. mode="hard" to compare a soft-mode prediction against the legacy bin-equality matcher).

Argument schema

The tool signature mirrors the HTTP route surface plus the signals_dict argument. JSON schema:

{
  "name": "compare_to_history",
  "description": "Match a live or hypothetical signal vector against daily_signals history. Returns forward-return distributions (1d / 3d / 5d SPY) for similar historical setups.",
  "input_schema": {
    "type": "object",
    "properties": {
      "signals_dict": {
        "type": "object",
        "description": "Signal vector to match against history. Keys must be daily_signals column names (validated against signal_definitions.json). Unknown keys are accepted but logged at WARNING and ignored by the matcher. Omit to use the latest live row from daily_signals.",
        "additionalProperties": true
      },
      "horizon": {
        "type": "integer",
        "enum": [1, 3, 5],
        "description": "Which forward-return horizon to highlight. Optional; the response always carries all three (forward_1d, forward_3d, forward_5d). When set, the response also includes a top-level headline reading for the requested horizon."
      },
      "mode": {
        "type": "string",
        "enum": ["soft", "hard", "euclidean", "tag", "hybrid"],
        "description": "Matcher mode. Omit to use the live default (currently 'soft'). 'soft' / 'euclidean' are aliases — weighted Euclidean distance over the full signal vector. 'hard' = legacy bin-equality. 'tag' (C2, 2026-05) = coarse-tag pre-filter only, ranked by recency. 'hybrid' (C2) = tag pre-filter then Euclidean rank within the filtered set; option C per DOCTRINE §6 Q1. When the tag filter under-shoots _HYBRID_MIN_ANALOGUES the matcher falls back to Euclidean and surfaces hybrid_fallback=true on the response (loud-fail per DOCTRINE P0)."
      },
      "top_k": {
        "type": "integer",
        "minimum": 1,
        "maximum": 500,
        "description": "Euclidean / tag analogue-set cap. Omit to use the live default (currently 100)."
      },
      "recency_half_life_days": {
        "type": "integer",
        "minimum": 0,
        "description": "Exponential decay of analogue weight by trade-date age. 0 disables decay. Omit to use the live default (currently 730 = 2-year half-life)."
      },
      "backfill_weight": {
        "type": "number",
        "minimum": 0.0,
        "maximum": 1.0,
        "description": "Weight applied to rows tagged report_filename='historical-backfill'. Live rows always weight 1.0. Omit to use the live default (currently 1.0)."
      },
      "dim_trust": {
        "type": "object",
        "description": "Per-dim trust multiplier override (e.g. {'pct_above_50sma': 1.0, 'hy_oas': 1.0}). Pass {} to disable the default proxy-trust down-weighting entirely. Omit to use the live _DIM_TRUST_DEFAULT.",
        "additionalProperties": { "type": "number" }
      },
      "include_analogue_dates": {
        "type": "boolean",
        "default": false,
        "description": "When true, the response appends analogue_dates: [{date, distance, weight, forward_1d, forward_3d, forward_5d}, ...]. Adds payload size; off by default."
      }
    }
  }
}

Validation rules

  1. Unknown signals_dict keys. Every key is checked against signal_definitions.json::api_field_path and the daily_signals column set. Unknown keys log a single WARNING event per call (logger.warning("compare_to_history: unknown signal key %r ignored", key)) and drop out of the call to compute_base_rates(). The matcher already tolerates missing dims; this is the existing graceful path.
  2. Type coercion. Numeric dims that arrive as strings ("22.4") are coerced once; failure logs a WARNING and drops the key. Categorical dims are taken verbatim.
  3. Missing signals_dict entirely. When omitted, the tool reads SELECT * FROM daily_signals ORDER BY trade_date DESC LIMIT 1 — the same path the HTTP route uses. This is the “match today against history” common case.
  4. Override-knob bounds. top_k, recency_half_life_days, backfill_weight use the same bounds the HTTP route declares. Out-of-bounds values raise a structured error before reaching the matcher.

Return schema

The base response is the compute_base_rates() payload verbatim, JSON-serialized. Example with default args:

{
  "matcher_version": "2026.05.8-hybrid-13d-recency730",
  "mode": "soft",
  "trade_date": "2026-05-20",
  "sample_size": 100,
  "live_count": 67,
  "backfill_count": 33,
  "backfill_weight": 1.0,
  "recency_half_life_days": 730,
  "total_history_days": 1319,
  "match_dimensions": 16,
  "matched_bins": {
    "regime": "CAUTIOUS",
    "vix_close": "VIX_MED",
    "dix": "DIX_NEUTRAL",
    "hy_oas": "CREDIT_NORMAL"
  },
  "top_k": 100,
  "distance_min": 0.842,
  "distance_max": 3.117,
  "distance_mean": 1.984,
  "forward_1d": {
    "count": 98, "effective_count": 76.4,
    "mean": 0.12, "median": 0.18, "stdev": 0.94,
    "positive_pct": 56.1,
    "p5": -1.61, "p25": -0.34, "p75": 0.71, "p95": 1.42,
    "worst": -2.87, "best": 2.41,
    "ci_low_95": -0.07, "ci_high_95": 0.31
  },
  "forward_3d": { /* same shape */ },
  "forward_5d": { /* same shape */ },
  "regime_distribution": {
    "CAUTIOUS": 54, "TRANSITIONAL": 28, "RISK-ON": 18
  },
  "regime_breakdown": {
    "CAUTIOUS": { "count": 54, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} },
    "TRANSITIONAL": { "count": 28, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} },
    "RISK-ON": { "count": 18, "forward_1d": {...}, "forward_3d": {...}, "forward_5d": {...} }
  }
}

With include_analogue_dates=true

Appends:

{
  "analogue_dates": [
    { "date": "2024-09-12", "distance": 0.842, "weight": 0.91,
      "forward_1d": 0.34, "forward_3d": 0.82, "forward_5d": 1.14 },
    { "date": "2025-02-04", "distance": 0.917, "weight": 0.96,
      "forward_1d": -0.18, "forward_3d": 0.41, "forward_5d": 0.93 },
    /* ... up to top_k entries, sorted by distance ascending ... */
  ]
}

Each entry carries the matched date, the raw distance the matcher computed, the final analogue weight (after backfill_weight × recency decay), and the forward returns the matcher used in the headline aggregates. The agent can pick the three closest analogues, drill into their reports, and read the actual narrative for each.

With horizon set

Adds a top-level shortcut block so the agent doesn’t have to pick the right forward_Nd key:

{
  "horizon": 3,
  "headline": {
    "mean": 0.34, "median": 0.41,
    "positive_pct": 61.2,
    "ci_low_95": 0.08, "ci_high_95": 0.59,
    "sample_size": 95
  }
}

Mode selection

The HTTP route’s defaults are the tool’s defaults. Always.

mode = "soft"
top_k = 100
recency_half_life_days = 730
backfill_weight = 1.0
dim_trust = _DIM_TRUST_DEFAULT

These are read from the same constants compute_base_rates() uses. When the matcher evolves — Stream C2 introduces hybrid mode, a future tweak bumps top_k to 150, the dim trust map gets a new entry — the MCP tool inherits the change in the next deploy. The client makes the call exactly the same way and gets the new behavior. No tool-side enum to update, no contract to renegotiate.

MATCHER_VERSION is emitted on every response (current value: "2026.05.8-hybrid-13d-recency730"). The version tag is the agent’s hook for detecting matcher upgrades. An agent that pins a thesis against a specific matcher version can compare response.matcher_version to its expected value and re-run if the matcher has bumped.

Bump rules for MATCHER_VERSION are defined in app/signals/base_rates.py — bump on mode-default flip, default top_k / recency change, soft dim set change, bin-threshold change, distance-metric semantics change. Don’t bump on behavior-identical refactors. After a bump, python scripts/backfill_base_rate_predictions.py --force re-seeds the calibration substrate.

Example usage

Example A — live signals, default args

The most common call. Read today’s row, match against history, return the headline.

Request:

{ "tool": "compare_to_history", "arguments": {} }

Response (abbreviated):

{
  "matcher_version": "2026.05.8-hybrid-13d-recency730",
  "mode": "soft",
  "trade_date": "2026-05-20",
  "sample_size": 100,
  "forward_1d": { "mean": 0.12, "median": 0.18, "positive_pct": 56.1, "ci_low_95": -0.07, "ci_high_95": 0.31, "count": 98 },
  "forward_3d": { "mean": 0.34, "median": 0.41, "positive_pct": 61.2, "ci_low_95": 0.08, "ci_high_95": 0.59, "count": 95 },
  "forward_5d": { "mean": 0.51, "median": 0.62, "positive_pct": 64.8, "ci_low_95": 0.21, "ci_high_95": 0.84, "count": 93 },
  "regime_distribution": { "CAUTIOUS": 54, "TRANSITIONAL": 28, "RISK-ON": 18 }
}

The agent reads: “100 analogues, modest positive lean across all three horizons, CI on 3d return excludes zero — moderate conviction long.” Cheap, fast, one call.

Example B — hypothetical scenario with analogues + trust override

A research agent wants to know what historically happens when VIX climbs to 32 and DIX collapses to 0.39 in a CAUTIOUS regime — and wants the matched dates so it can read the actual reports. It also doesn’t want the matcher down-weighting the breadth-proxy dim, so it overrides dim_trust.

Request:

{
  "tool": "compare_to_history",
  "arguments": {
    "signals_dict": {
      "vix_close": 32.0,
      "dix": 0.39,
      "regime": "CAUTIOUS",
      "hy_oas": 4.1,
      "pct_above_50sma": 28,
      "gex": -2.4,
      "energy_regime": "STABLE"
    },
    "horizon": 5,
    "include_analogue_dates": true,
    "dim_trust": { "pct_above_50sma": 1.0, "hy_oas": 1.0 }
  }
}

Response (abbreviated):

{
  "matcher_version": "2026.05.8-hybrid-13d-recency730",
  "mode": "soft",
  "sample_size": 42,
  "horizon": 5,
  "headline": {
    "mean": -0.84, "median": -0.61, "positive_pct": 38.1,
    "ci_low_95": -1.42, "ci_high_95": -0.27,
    "sample_size": 41
  },
  "forward_1d": { /* ... */ },
  "forward_3d": { /* ... */ },
  "forward_5d": { "mean": -0.84, "median": -0.61, "positive_pct": 38.1, "ci_low_95": -1.42, "ci_high_95": -0.27, "count": 41 },
  "regime_distribution": { "CAUTIOUS": 31, "RISK-OFF": 11 },
  "analogue_dates": [
    { "date": "2022-09-13", "distance": 0.612, "weight": 0.41, "forward_1d": -1.94, "forward_3d": -3.12, "forward_5d": -4.27 },
    { "date": "2023-03-09", "distance": 0.701, "weight": 0.57, "forward_1d": -0.32, "forward_3d": 0.18, "forward_5d": 1.42 },
    { "date": "2024-04-15", "distance": 0.844, "weight": 0.82, "forward_1d": -0.94, "forward_3d": -1.21, "forward_5d": -0.84 }
  ]
}

The agent reads: “42 analogues for that setup, 5d mean −0.84% with CI excluding zero on the negative side, mostly CAUTIOUS with some RISK-OFF spillover — bearish lean. Top three closest historical matches were 2022-09-13 (sharp drop), 2023-03-09 (recovery), 2024-04-15 (mild drop). I’ll pull those three reports next.” The override to dim_trust ensures the breadth and HY OAS dims contribute at full weight rather than the default proxy-discount.

Implementation notes

For the implementer:

Testing requirements

Per P9 — every new behavior ships with tests in the same commit. The contract is non-trivial; tests must pin all three load-bearing surfaces.

  1. End-to-end contract test against the live HTTP route. A pytest in mcp/tests/test_tools_public.py calls compare_to_history with no args and /api/v1/signals/base-rates with no args (against the same backend), then asserts the JSON responses are equal modulo the MCP-only fields. Pins the “tool is a wrapper, not a reimplementation” invariant. If anyone forks the matcher logic into the tool, this test fails.
  2. Schema-validation test for unknown keys. Call the tool with signals_dict = {"vix_close": 22.4, "totally_made_up_key": 999}. Assert that (a) the call succeeds, (b) a WARNING was logged matching the unknown-key pattern, (c) the response is identical to the same call without the unknown key. Pins the “loud but graceful” contract from Decision 1.
  3. dim_trust override round-trip. Call with dim_trust={"hy_oas": 1.0, "pct_above_50sma": 1.0} and a fixed signals_dict, then call again with dim_trust={} (disable trust entirely). Assert the distances reported in analogue_dates differ — proves the override actually reaches the matcher.
  4. include_analogue_dates flag test. Call with include_analogue_dates=false (default) and assert analogue_dates is absent. Call with true and assert it’s present, sorted by distance ascending, and the entry count equals sample_size.
  5. MATCHER_VERSION emission. Every response carries matcher_version. Pin via a tiny assertion in the contract test — if a future refactor drops the field, the test catches it.
  6. KB article presence test. Add a row to scripts/verify-docs.sh (or the existing KB-presence test in tests/) confirming ui/src/content/kb/mcp/compare-to-history.mdx exists. Per P3, every non-trivial change includes its KB article in the same PR — the test enforces it stays.

See also