KB / operations
Signal Validation
Last verified
This platform tracks dozens of market signals. A fair question from anyone who relies on it: do these signals actually predict anything, or are they just well-dressed narrative? We take that question seriously enough to test it adversarially — and to publish what we find, including the parts that did not work. This page is the standing record of that testing: the method, what we expected going in, and what the data actually said.
Why we test
- Every signal and every weight should earn its place with evidence, not intuition. A dashboard full of plausible-sounding indicators is easy to build and easy to fool yourself with.
- Publishing negative results keeps us honest, and keeps us from re-running experiments that have already been settled.
- It explains the platform’s framing: why we describe market conditions and risk rather than sell predictions.
The testing standard
We hold every claim to an out-of-sample bar. In plain terms, the rules we test under:
- Out-of-sample, not in-sample. A model that fits the past perfectly is worthless if it can’t handle data it has never seen. We always score on held-out periods, never on the data the model was fit to.
- Walk-forward / time-split. Markets are time-ordered — you can’t shuffle the days. We train on earlier periods and test on later ones (and the reverse), so a result has to survive being tested on a market the model wasn’t trained on.
- Multiple market regimes. Our daily history spans about five years, including the 2022 bear market. A signal that only “works” in a rising market isn’t a signal — it’s a description of a rising market.
- Leakage guards. The most common way to fool yourself is to let information from the future leak into the inputs. Forward returns are computed strictly forward; every input is strictly as-of the decision date.
- Multiple-comparison correction. Test enough signals and a few will look significant by pure chance. We apply Bonferroni correction, so a “winner” has to clear a far higher bar than a single lucky test.
- Overlap-aware significance. Overlapping multi-day windows inflate apparent significance. We adjust for it (Newey-West and non-overlapping subsamples) rather than trusting naive statistics.
- Parsimony over kitchen-sink. Large models with many inputs overfit and swing wildly out-of-sample. Small, interpretable models are both more honest and more stable — and when a big model beats a small one only in-sample, that’s a red flag, not a discovery.
What we tested, what we expected, what we got
| Question | What we expected | What the data said |
|---|---|---|
| Predict next-day direction (up / down)? | Hoped for an edge over simply holding | No edge. Out-of-sample, no model beats the baseline that “the market drifts up over time.” Stated conviction was, if anything, slightly anti-correlated with being right. |
| Predict next-day move size / volatility? | Untested, worth a shot | No edge. Out-of-sample, models predicting magnitude did no better than — and sometimes worse than — a naive constant baseline. |
| Do options-flow signals (dark-pool index, dealer gamma, 0DTE put/call) predict direction? | A popular market thesis | Weak. These rank among the least predictive of every signal tested. They describe market structure well; they do not time it. |
| Does signed option order-flow predict direction? | The most compelling untested idea | Inconclusive — and not yet testable. The directional version of this signal isn’t in our history yet; a simple proxy added nothing beyond the ordinary put/call ratio. We’re capturing the real version going forward to test once enough history accrues. |
| Do credit & inflation leaders flag elevated multi-week drawdown risk? | Academic literature says they lead at a 1–3 month horizon | A real, if fragile, signal. Inflation breakevens, high-yield credit spreads, and real yields carry low-confidence information about elevated drawdown risk over the following weeks. A probabilistic risk gauge — not a timing trigger. |
What this means for the platform
- We don’t sell “predictions” or a magic score. The evidence says next-day direction isn’t predictable from this data, and we’d rather tell you that plainly than dress it up.
- We weight the signals the evidence supports. Inflation breakevens and credit spreads — the multi-week drawdown leaders — carry real weight. The popular flow signals that test weak are kept for context, not over-weighted as if they timed the market.
- The health reading is a conditions gauge, deliberately. It confirms and describes the market state; it does not claim to lead it. A signal that lags price is still useful for understanding where you are — it just isn’t a forecast.
What we won’t chase again
Transparency cuts both ways. These avenues are tested and shelved:
- Buying more options-flow or short-interest data. It’s the same family of signal that already tests weak; more of it won’t manufacture an edge.
- A “predictive score” that calls market direction. Repeatedly disproven across independent tests. We won’t rebrand it and try again without genuinely new information.
What’s still open
- Multi-week drawdown risk. The one promising thread. Worth developing carefully as a probabilistic risk tilt, with continued out-of-sample validation as additional market regimes accumulate. We treat it as low-confidence until it has survived more than one stress period.
- Signed order-flow, captured going forward. We’ll re-test the real, direction-aware version once enough history exists to judge it fairly.
- Longer horizons in general. The signals that matter shift with the horizon, and our shortest-horizon tests were the harshest. The monthly scale is where the credit and inflation leaders live, and it’s where we expect the next genuine findings — if there are any.
We update this page when new tests run. If a signal earns its place, the evidence will be here. If it doesn’t, that will be here too.