Signal Validation

Last verified2026-05-26

This platform tracks dozens of market signals. A fair question from anyone who relies on it: do these signals actually predict anything, or are they just well-dressed narrative? We take that question seriously enough to test it adversarially — and to publish what we find, including the parts that did not work. This page is the standing record of that testing: the method, what we expected going in, and what the data actually said.

Why we test

Every signal and every weight should earn its place with evidence, not intuition. A dashboard full of plausible-sounding indicators is easy to build and easy to fool yourself with.
Publishing negative results keeps us honest, and keeps us from re-running experiments that have already been settled.
It explains the platform’s framing: why we lead with market conditions, risk, and move-size rather than sell a direction call.

The testing standard

We hold every claim to an out-of-sample bar. In plain terms, the rules we test under:

Out-of-sample, not in-sample. A model that fits the past perfectly is worthless if it can’t handle data it has never seen. We always score on held-out periods, never on the data the model was fit to.
Walk-forward / time-split. Markets are time-ordered — you can’t shuffle the days. We train on earlier periods and test on later ones (and the reverse), so a result has to survive being tested on a market the model wasn’t trained on.
Multiple market regimes. Our daily history spans about five years, including the 2022 bear market. A signal that only “works” in a rising market isn’t a signal — it’s a description of a rising market.
Leakage guards. The most common way to fool yourself is to let information from the future leak into the inputs. Forward returns are computed strictly forward; every input is strictly as-of the decision date.
Multiple-comparison correction. Test enough signals and a few will look significant by pure chance. We apply Bonferroni correction, so a “winner” has to clear a far higher bar than a single lucky test.
Overlap-aware significance. Overlapping multi-day windows inflate apparent significance. We adjust for it (Newey-West and non-overlapping subsamples) rather than trusting naive statistics.
Parsimony over kitchen-sink. Large models with many inputs overfit and swing wildly out-of-sample. Small, interpretable models are both more honest and more stable — and when a big model beats a small one only in-sample, that’s a red flag, not a discovery.

What we tested, what we expected, what we got

Question	What we expected	What the data said
Predict next-day direction (up / down)?	Hoped for an edge over simply holding	No edge. Out-of-sample, no model beats the baseline that “the market drifts up over time.” Stated conviction was, if anything, slightly anti-correlated with being right.
Predict next-day move size?	Untested, worth a shot	No edge at the one-day scale. Out-of-sample, models predicting next-day magnitude did no better than a naive constant baseline.
Read forward move size / volatility over the coming weeks?	Followed once the one-day test failed	A real, VIX-independent signal. Several positioning and macro signals that carry no directional edge — dealer gamma, breadth, sentiment surveys — do carry information about how big forward moves run (forward realized volatility), and the read survives controlling for the current VIX level. This is the magnitude analogue of the drawdown-risk lane. Research-evidenced, not yet a shipped score — it still needs to hold across a second stress regime before it becomes a product surface.
Do options-flow signals (dark-pool index, dealer gamma, 0DTE put/call) predict direction?	A popular market thesis	Weak. These rank among the least predictive of every signal tested. They describe market structure well; they do not time it.
Does signed option order-flow predict direction?	The most compelling untested idea	Inconclusive — and not yet testable. The directional version of this signal isn’t in our history yet; a simple proxy added nothing beyond the ordinary put/call ratio. We’re capturing the real version going forward to test once enough history accrues.
Do credit & inflation leaders flag elevated multi-week drawdown risk?	Academic literature says they lead at a 1–3 month horizon	A real, if fragile, signal. Inflation breakevens, high-yield credit spreads, and real yields carry low-confidence information about elevated drawdown risk over the following weeks. A probabilistic risk gauge — not a timing trigger.

What this means for the platform

We don’t sell a direction call or a magic score. The evidence says which way price goes isn’t predictable from this data, and we’d rather tell you that plainly than dress it up.
We lead with the reads the evidence supports. The validated lanes are risk and magnitude: inflation breakevens and credit spreads flag multi-week drawdown risk, and positioning/breadth/sentiment signals read forward move size. The popular flow signals carry real magnitude information even though they don’t time direction — so we keep them for that, not over-weighted as if they called the way the market goes.
The health reading is a conditions gauge, deliberately. It confirms and describes the market state; it does not claim to lead it. A signal that lags price is still useful for understanding where you are — it just isn’t a forecast.

What we won’t chase again

Transparency cuts both ways. These avenues are tested and shelved:

Buying more options-flow or short-interest data. It’s the same family of signal that already tests weak; more of it won’t manufacture an edge.
A “predictive score” that calls market direction. Repeatedly disproven across independent tests. We won’t rebrand it and try again without genuinely new information.

What’s still open

Multi-week drawdown risk. A promising, evidenced thread. Worth developing carefully as a probabilistic risk tilt, with continued out-of-sample validation as additional market regimes accumulate. We treat it as low-confidence until it has survived more than one stress period.
Forward move-size / volatility. The newest evidenced lane: positioning and macro signals read how big forward moves run, VIX-independently, and the read held up under a richer volatility-surface control. It needs to reproduce across a second stress regime before it ships as a product surface — until then it’s a research verdict, not a dial on the dashboard.
Signed order-flow, captured going forward. We’ll re-test the real, direction-aware version once enough history exists to judge it fairly.
Longer horizons in general. The signals that matter shift with the horizon, and our shortest-horizon tests were the harshest. The monthly scale is where the credit and inflation leaders live, and it’s where we expect the next genuine findings — if there are any.

We update this page when new tests run. If a signal earns its place, the evidence will be here. If it doesn’t, that will be here too.

Signals:breakeven_5yhy_oasreal_yield_10ygexdix